Primary Key Selection

Choosing the primary key of an entity is an important step that requires serious consideration. Before you actually select a primary key, consider several attributes, which are referred to as candidate key attributes. Typically, the business user who knows the business and business data can help identify candidate keys.

For example, to use the EMPLOYEE entity in a data model (and later in a database) correctly, you uniquely identify instances. In the customer table, you could choose from several potential key attributes including: the employee name, a unique employee number assigned to each instance of EMPLOYEE, or a group of attributes, such as name and birth date.

The rules that you use to select a primary key from the list of all candidate keys are stringent. The rules can be consistently applied across all types of databases and information. The rules state that the attribute or attribute group must:

Uniquely identify an instance.
Never include a NULL value.
Not change over time. An instance takes its identity from the key. If the key changes, it is a different instance.
Be as short as possible, to facilitate indexing and retrieval. If you must use a key that is a combination of keys from other entities, verify that each part of the key adheres to the other rules.

Example:

Consider which attribute you would select as a primary key from the following list of candidate keys for an EMPLOYEE entity:

employee number
employee name
employee social security number
employee birth date
employee bonus amount

If you use the rules in the preceding list to find candidate keys for EMPLOYEE, you could compose the following analysis of each attribute:

�employee number� is a candidate key because it is unique for all EMPLOYEEs
�employee name� is probably not a good candidate because multiple employees can have the same name, such as Mary Jones.
�employee social security number� is unique in most instances, but every EMPLOYEE may not have one.
The combination of �employee name� and �employee birth date� may work, unless there is more than one John Smith born on the same date and employed by our company. This combination could be a candidate key.
Only some EMPLOYEEs of our company are eligible for annual bonuses. Therefore, �employee bonus amount� can be expected to be NULL in many cases. As a result, it cannot be part of any candidate key.

After analysis, there are two candidate keys. One is �employee number� and the other is the group of attributes containing �employee name� and �employee birth date.� �employee number� is selected as the primary key because it is the shortest and helps ensure uniqueness of instances.

When choosing the primary key for an entity, data architects often assign a surrogate key. A surrogate key is an arbitrary number that is assigned to an instance to identify it within an entity uniquely. �employee number� is an example of a surrogate key. A surrogate key is often the best choice for a primary key. A surrogate key is short, can be accessed the fastest, and helps ensure unique identification of each instance. The system can also automatically generate surrogate keys so that numbering is sequential and does not include any gaps.

A primary key chosen for the logical model is not always the primary key used to access the table efficiently in a physical model. The primary key can be changed to suit the needs and requirements of the physical model and database at any point.